

PROTECTING YOUR NETWORK

Richard Johnson ToorCon San Diego 2016









#### Introduction

- Richard Johnson
  - Research Manager
  - Cisco Talos
- Team
  - Aleksandar Nikolich
  - Ali Rizvi-Santiago
  - Marcin Noga
  - Piotr Bania
  - Tyler Bohan
  - Yves Younan
- Special Contributor
  - Andrea Allevi

- Talos Vulndev
  - Third party vulnerability research
    - 170 bug finds in last 12 months
      - Microsoft
      - Apple
      - Oracle
      - Adobe
      - Google
      - IBM, HP, Intel
      - 7zip, libarchive, NTP
  - Security tool development
    - Fuzzers, Crash Triage
  - Mitigation development
    - FreeSentry

#### Introduction

- Agenda
  - Tracing Applications
  - Guided Fuzzing
  - Binary Translation
  - Hardware Tracing
- Goals
  - Understand the attributes required for optimal guided fuzzing
  - Identify areas that can be optimized today
  - Deliver performant and reusable tracing engines



# Applications

- Software Engineering
  - Performance Monitoring
  - Unit Testing
- Malware Analysis
  - Unpacking
  - Runtime behavior
  - Sandboxing
- Mitigations
  - Shadow Stacks
  - Memory Safety checkers



# Applications

- Software Security
  - Corpus distillation
    - Minimal set of inputs to reach desired conditions
  - Guided fuzzing
    - Automated refinement / genetic mutation
  - Crash analysis
    - Crash bucketing
    - Graph slicing
    - Root cause determination
  - Interactive Debugging



# Tracing Engines

#### OS Provided APIs

- Debuggers
  - ptrace
  - dbgeng
  - signals
- Hook points
  - Linux LTT(ng)
  - Linux perf
  - Windows Nirvana
  - Windows AppVerifier Check out Alex Ionescu's
  - Windows Shim Engine

RECON 2015 talk

- Performance counters
  - Linux perf
  - Windows PDH



# Tracing Engines

- Binary Instrumentation
  - Compiler plugins
    - gcc-gcov
    - Ilvm-cov
  - Binary translation
    - Valgrind
    - DynamoRIO
    - Pin
    - DynInst
    - Frida and others
    - •



# Tracing Engines

- Native Hardware Support
  - Single Step / Breakpoint
  - Intel Branch Trace Flag
  - Intel Last Branch Record
  - Intel Branch Trace Store
  - Intel Processor Trace
  - ARM CoreSight











#### **Evolutionary Testing**

- Early work was whitebox testing
- Source code allowed graph analysis prior to testing
- Fitness based on distance from defined target
- Complex fitness landscape
  - Difficult to define properties that will get from A to B
- Applications were not security specific
  - Safety critical system DoS



# Guided Fuzzing

- Incrementally better mutational dumb fuzzing
- Trace while fuzzing and provide feedback signal
- Evolutionary algorithms
  - Assess fitness of current input
  - Manage a pool of possible inputs
- Focused on security bugs



#### Sidewinder

- Embleton, Sparks, Cunningham 2006
- Features
  - Simple genetic algorithm approach
    - crossover, mutation, fitness
  - Mutated context free grammar instead of sample fuzzing
  - Markov process for fitness
    - Analyzes probability of path taken by sample
  - Block coverage via debugger API
    - Reduced overhead by focusing on subgraphs





#### Sidewinder

- Embleton, Sparks, Cunningham 2006
- Contributions
  - Genetic algorithms for fuzzing
  - Markov process for fitness
  - System allows selection of target code locations
- Observations
  - Never opensourced
  - Interesting concepts not duplicated





# Evolutionary Fuzzing System

- Jared DeMott 2007
- Features
  - Block coverage via Process Stalker
    - Windows Debug API
    - Intel BTF
  - Stored trace results in SQL database
    - Lots of variables required structured storage
  - Traditional genetic programming techniques
    - Code coverage + diversity for fitness
    - Sessions
    - Pools
    - Crossover
    - Mutation



# Evolutionary Fuzzing System

- Jared DeMott 2007
- Contributions
  - First opensource implementation of guided fuzzing
  - Evaluated function vs block tracing
    - For large programs found function tracing was equally effective
    - Likely an artifact of doing text based protocols
- Observations
  - Academic
    - Approach was too closely tied to traditional genetic algorithms
    - Not enough attention to performance or real world targets
    - Only targeted text protocols



# Amercian Fuzzy Lop

- Michal Zalewski 2013
  - Bunny The Fuzzer 2007
- Features
  - Block coverage via compile time instrumentation
  - Simplified approach to genetic algorithm
    - Edge transitions are encoded as tuple and tracked in global map
    - Includes coverage and frequency
  - Uses variety of traditional mutation fuzzing strategies
  - Dictionaries of tokens/constants
  - First practical high performance guided fuzzer
  - Helper tools for minimizing test cases and corpus
  - Attempts to be idiot proof



# Amercian Fuzzy Lop

- Michal Zalewski 2013
  - Bunny The Fuzzer 2007
- Contributions
  - Tracks edge transitions
    - Not just block entry
  - Global coverage map
    - · Generation tracking
  - Fork server
    - Reduce fuzz target initialization
  - Persistent mode fuzzing
  - Builds corpus of unique inputs reusable in other workflows

```
american fuzzy lop 0.47b (readpng)
                                                                      overall results
        run time : 0 days, 0 hrs, 4 min, 43 sec
                                                                      cycles done : 0
last new path : 0 days, 0 hrs, 0 min, 26 sec
last uniq crash : none seen yet
                                                                      total paths: 195
                                                                     uniq crashes : 0
last uniq hang: 0 days, 0 hrs, 1 min, 51 sec
                                                                       uniq hangs: 1
 now processing: 38 (19.49%)
                                                   map density : 1217 (7.43%)
                                               count coverage : 2.55 bits/tuple
paths timed out : 0 (0.00%)
                                                 findings in depth
              : interest 32/8
                                                                   128 (65.64%)
                 0/9990 (0.00%)
654k
                                                new edges on: 85 (43.59%)
                                               total crashes
 exec speed: 2306/sec
 bit flips: 88/14.4k, 6/14.4k, 6/14.4k
byte flips: 0/1804, 0/1786, 1/1750
rithmetics: 31/126k, 3/45.6k, 1/17.8k
known ints: 1/15.8k, 4/65.8k, 6/78.2k
                 34/254k, 0/0
        trim : 2876 B/931 (61.45% gain)
```



# Amercian Fuzzy Lop

- Michal Zalewski 2013
  - Bunny The Fuzzer 2007
- Observations
  - KISS works when applied to guided fuzzing
  - Performance top level priority in design
    - Source instrumentation can't be beat
    - Evolutionary system hard to beat without greatly increasing complexity / cost
  - Simple to use, finds tons of bugs
  - Fostered a user community
    - Developer contributions somewhat difficult
  - Current state of the art due to good engineering and feature set
  - Only mutational fuzzer system to have many third-party contributions
    - Binary support via QEMU and Dyninst

# honggfuzz

- Robert Swiecki 2010
  - Guided fuzzing added in 2015
- Features
  - Block coverage
    - Hardware performance counters
    - ASanCoverage
  - Bloom filter for trace recording
  - User-supplied mutation functions
  - Linux, FreeBSD, OSX, Cygwin support
- Contributions
  - First guided fuzzer to focus on hardware tracing support
- Observations
  - Naive seed selection for most algorithms, only the elite survive (OTTES)
    - · Some modes use bloom filter
  - Easy to extend, active development



#### Choronzon

#### Features

- Brings back specific genetic programming concepts
- Contains strategies for dealing with high level input structure
  - Chunk based
  - Hierarchical
  - Containers
- Format aware serialization functionality
- Uses DBI engines for block coverage (PIN / DynamoRIO)
- Attempts to be cross-platform
- Contributions
  - Reintroduction of more complex genetic algorithms
  - Robust handling of complex inputs through user supplied serialization routines
- Observations
  - Performance not a focus



#### Honorable mentions

- autodafe
  - Martin Vuagnoux 2004
  - First generation guided fuzzer using pattern matching via API hooks
- Blind Code Coverage Fuzzer
  - Joxean Koret 2014
  - Uses off-the-shelf components to assemble a guided fuzzer
    - radamsa, zzuf, custom mutators
    - drcov, COSEINC RunTracer for coverage
- covFuzz
  - Atte Kettunen 2015
  - Simple node.js server for guided fuzzing
  - custom fuzzers, ASanCoverage



# Guided Fuzzing

- Required
  - Fast tracing engine
    - Block based granularity
  - Fast logging
    - Memory resident coverage map
  - Fast evolutionary algorithm
    - Minimum of global population map, pool diversity
- Desired
  - Portable
  - Easy to use
  - Helper tools
  - Grammar detection
- AFL and Honggfuzz still most practical options











#### Binary Translation

- Binary translation is a robust program modification technique
  - JIT for hardware ISAs
- General overview is straightforward
  - Copy code to cache for translation
  - Insert instructions to modify original binary
  - Link blocks into traces
- Performance comes from smart trace creation
  - Originally profiling locations for hot trace
  - Early optimizations in Dynamo from HP
    - Next Executing Tail
    - Traces begin at backedge or other trace exit
  - Ongoing optimization work happens here
    - VMware Early Exit guided



# Binary Translation

- Advantages
  - Supported on most mainstream OS/archs
  - Can be faster than hardware tracing
  - Can easily be targeted at certain parts of code
  - Can be tuned for specific applications
- Disadvantages
  - Performance overhead
    - Introduces additional context switch
  - ISA compatibility not guarenteed
  - Not always robust against detection or escape



# Valgrind

- Obligatory slide
- Lots of deep inspection tools
- VEX IR is well suited for security applications
- Slow and Linux only, DynamoRIO good replacement
- Many cool tools already exist
  - Flayer
  - Memgrind



#### Pin

- "DBT with training wheels"
- Features
  - Trace granularity instrumentation
    - Begin at branch targets, end at indirect branch
  - Block/instruction level hooking supported
  - Higher level C++ API w/ helper routines
  - Closed source
- Observations
  - Delaying instrumentation until trace generation is slower
  - Seems most popular with casual adventurers
  - Limited inlining support
  - Less tuning options
  - Cannot observe blocks added to cache so 'hit trace' not possible



#### Pin

#### Example



# DynamoRIO

- "A connoisseur's DBT"
- Features
  - Block level instrumentation
    - Blocks are directly copied into code cache
  - Direct modification of IL possible
  - Portable
    - Linux, Windows, Android
    - x86/x64, ARM
  - C API / BSD Licensed (since 2009)
- Observations
  - Much more flexible for block level instrumentation
  - Performance is a priority, Windows is a priority
  - Powerful tools like Dr Memory
    - · Shadow memory, taint tracking
    - Twice as fast as Valgrind memcheck



# DynamoRIO

#### Example

```
event basic block(void *drcontext, void *tag, instrlist t *bb,
           bool for trace, bool translating)
  instr t *instr, *fist = instrlist fist(bb);
  uint fags;
  /* Our inc can go anywhere, so find a spotwhere fags are dead. */
  for (instr = first; instr != NULL; instr = instr get next(instr))
     fags = instr get_arith_fags(instr);
     /* OP inc doesn'twrite CF but notworth distinguishing */
     if (TESTALL (EFLAGS WRITE 6, fags) & & !TESTANY (EFLAGS READ 6,
          fags))
       break;
```

# DynamoRIO

#### Example



# DynInst

- "Static rewriting IS possible!"
- Features
  - Static rewriting support
    - Dynamically linked binaries only
    - Eliminates issues with instruction cache misses common to DBT engines
  - Function level analysis
    - · Tools must manually walk Dyninst provided CFG to instrument blocks
  - Modular C++ API / LGPL
- Observations
  - Fastest binary instrumentation out there
  - Development is slow
    - · Patches we sent in for PE relocation support still not merged
  - Building Dyninst is NP-Hard
    - Use my Dockerfile on github.com/talos-vulndev/afl-dyninst



# DynInst

#### Example

```
boolinsertBBCallback(BPatch binaryEdit * appBin,BPatch function * curFunc,
             char*funcNam e,BPatch_function * instBBIncFunc,int *bbIndex)
  unsigned short rand D;
  BPatch fow Graph *appCFG = curFunc-> getCFG ();
  BPatch Set < BPatch basicBlock *> allBlocks;
  BPatch Set < BPatch basicBlock *> ::iteratoriter;
  for (iter = allB locks.begin (); iter != allB locks.end (); iter++)
     unsigned long address = (*iter)-> getStartAddress ();
     rand \mathbf{D} = rand() % USHRT MAX;
     BPatch Vector < BPatch snippet *> instArgs;
     BPatch constExprbbId (rand ID);
     instArgs.push_back (&bbId);
```



# DynInst

#### Example



# Tuning Binary Translation

- Only instrument indirect branches
- Delay instrumentation until input is seen
- Only instrument threads that access the data
- Move instrumentation logic to analysis routines
  - Some APIs provide IF-THEN-ELSE analysis with optimization
- Avoid trampolines
  - Be aware of code locality and instruction cache
  - Directly inline instructions, modify AST if possible
- Inject a fork server if repeatedly executing DBT
  - See our turbotrace tool









### CPU Event Monitoring

- Modern CPUs contain Performance Monitoring Units (PMU)
- Model Specific Registers (MSR) used for configuration
  - Requires privileged execution (kernel or better) to access
- Types
  - Event Counters
    - Polled on-demand
  - Event Sampling (non-precise)
    - Interrupts triggered when counters hit modulus value
  - Precise Event Sampling (PEBS)
    - Uses 'Debug Store'
    - Physical memory buffers
    - Interrupt when full
- Use Linux perf / pmu-tools to experiment



## Interrupt Programming

- Interrupts low level messaging system for system devices
  - CPU Exceptions
    - GPF, SINGLE\_STEP
  - Hardware Interrupts
    - Memory mapped or IRQ based
    - All Device I/O
  - Software Interrupts
    - System calls (int 0x80)
    - Breakpoints
- OS/hypervisor drivers required to configure interrupt handlers
  - Privileged registers or interrupt vector tables



### Interrupt Programming

- Interrupt Service Routines (ISR)
  - Registered by operating systems and drivers as callbacks
- CPU checks interrupt flag (IF) register after each instruction
  - cli and sti instructions control whether IF is checked
- CPU indexes the interrupt vector table to find appropriate handler
  - Context stored / restored while servicing interrupt
- Historically Familiar Interrupts:
  - int 1 Single Step (TF)
  - int 3 Single opcode, specifically designed for debugging
  - int 10h Any Demosceners?
  - int 24h Who @members:rror Handler I/O Device Specific Error Message Abort Retry Janore Fail?



## Interrupt Programming

- Programmer checklist
  - Memory must not be swapped
  - Use static variables if necessary
  - Must wrap functions with assembly
    - disable interrupts
    - push all registers
    - call interrupt handler
    - pop all registers
    - iretd



## Its a Trap

- Single Stepping
  - Enabled by setting the Trap Flag
  - After each instruction, CPU checks flag and fires exception if enabled
  - Accessible from userspace
- Branch Trace Flag
  - Modifies single step behavior to trap on branch
  - Single flag in IA32\_DEBUGCTL MSR
  - Requires kernel privileges to write to MSR
  - Windows includes a mapping from DR7 to set MSR





### IA32\_DEBUGCTL Register

#### MSR Address 0x1d9

- LBR [0] Enable Last Branch Record mechanism
- BTF [1] when enabled with TF in EFLAGS does single stepping on branches
- TR [6] enables Tracing (sending BTMs to system bus)
- BTS [7] enables sending BTMs to memory buffer from system bus
- BTINT [8] full buffer generates interrupt otherwise circular write
- BTS\_OFF\_OS [9] does not count for priv. level 0
- BTS\_OFF\_USR [10] does not count for priv. level 1,2,3
- FRZ\_LBRS\_ON\_PMI [11] freeze LBR stack on a PMI
- FRZ PERFMON ON PMI [12] disable all performance counters on a PMI
- UNCORE\_PMI\_EN [13] uncore counter interrupt generation
- SMM\_FRZ [14] event counters are frozen during SMM



#### Branch Trace Store

- First generation
   hardware branch
   tracing via PMU
- Allows configurable memory buffer for trace storage
- MSR\_IA32\_DS\_AREA MSR defines storage location
- DS\_AREA\_RECORD stored for each branch

```
structDS AREA {
    u64 bts buf∉r base;
    u64 bts index;
    u64 bts_absolute_m axim um;
    u64 bts interrupt threshold;
    u64 pebs buf∉r base;
    u64 pebs index;
    u64 pebs absolute maximum;
    u64 pebs interrupt threshold;
    u64 pebs event reset[4];
};
struct D S AREA RECORD {
    u64 fags;
    u64 ip;
    u64 regs[16];
    u64 status;
    u64 dla;
    u64 dse;
    u64 lat;
```



### Branch Trace Store





#### Branch Trace Store

- Branches in LBR registers spill to DS\_AREA
- Interrupts only when buffer is full
- Steps to enable BTS
  - Allocate memory and set MSR\_IA32\_DS\_AREA
  - Add interrupt handler to IDT
  - Register interrupt vector with APIC
    - apic\_write(APIC\_LVTPC, pebs\_vector);
  - Select events with MSR IA32 EVNTSEL0
    - EVTSEL\_EN | EVTSEL\_USR | EVTSEL\_OS
  - Enable PEBS mode with MSR\_IA32\_PEBS\_ENABLE
  - Enable CPU perf recording with MSR\_IA32\_GLOBAL\_CTRL
- Significantly faster than BTF
- Still impractical for high speed tracing



- Next generation hardware tracing support
  - Introduced in Broadwell/Skylake architecture
  - Per-hardware tracing thread
- Goal: full system branch tracing with 5-15% overhead
- Software support available in
  - Linux 4.1+ perf subsystem
  - Standalone Linux reference driver simple-pt
  - Intel VTune / System Studio\*\*
    - Remote debugging only
  - Talos IntelPT driver!



#### Features

- Can trace \*SMM, HyperVisor, Kernel, Userspace [CPL -2 to 3]
- Logs directly to physical memory
  - Bypasses CPU cache and eliminates TLB cache misses
  - Can be a contiguous segment or a set of ranges
  - · Ringbuffer snapshot or interrupt mode supported
- Minimal log format
  - One bit per conditional branch
  - Only indirect branches log dest address
  - Interrupts log source and destination
  - Decoding log requires original binaries and memory map
- Filter logging based on CR3
- Linux can automatically add log to coredump
- GDB Support



- 90+ pages in Intel Software Developer Manuals
- Randomly
   flipping bits
   doesn't work
   here

- Check with CPUID
- EAX = 0x14 Intel Processor Trace
- EBX
  - Bit 00: If 1, Indicates that IA32\_RTIT\_CTL.CR3Filter can be set to 1, and that IA32\_RTIT\_CR3\_MATCH\_MSR can be accessed.
  - Bit 01: If 1, Indicates support of Configurable PSB and Cycle-Accurate Mode.
  - Bit 02: If 1, Indicates support of IP Filtering, TraceStop filtering, and preservation of Intel PT MSRs across warm reset.
  - Bit 03: If 1, Indicates support of MTC timing packet and suppression of COFI-based packets.
- ECX
  - Bit 00: If 1, Tracing can be enabled with IA32\_RTIT\_CTL.ToPA = 1, hence utilizing the ToPA output scheme; IA32\_RTIT\_OUTPUT\_BASE and IA32\_RTIT\_OUTPUT\_MASK\_PTRS MSRs can be accessed.
  - Bit 01: If 1, ToPA tables can hold any number of output entries, up to the maximum allowed by the MaskOrTableOffset field of IA32 RTIT OUTPUT MASK PTRS.
  - Bit 02: If 1, Indicates support of Single-Range Output scheme.
  - Bit 03: If 1, Indicates support of output to Trace Transport subsystem.
  - Bit 31: If 1, Generated packets which contain IP payloads have LIP values, which include the CS base component
- Packet Generation (ECX = 1)
- EAX
  - Bits 2:0: Number of configurable Address Ranges for filtering.
  - Bit 31:16: Bitmap of supported MTC period encodings
- EBX
  - Bits 15-0: Bitmap of supported Cycle Threshold value encodings
  - Bit 31:16: Bitmap of supported Configurable PSB frequency encodings

- Hardware support detection
  - CPUID with leaf 0x7 indicates support for Intel PT
  - If supported, CPUID with leaf 0x14 can return the supported PT features
- Trace Record Filtering
  - Code Privileged Level (CPL) kernel vs userspace
  - PML4 Page Table single process / CR3 (page-table) filtering
  - Instruction Pointer up to 4 ranges of addresses can be specified
- Log Output Configuration
  - Single range
  - Table of Physical Addresses (ToPA)



- Single Buffer Trace Logging
  - Circular or Interrupt modes (Hardware logging support)
  - Reserve memory MmAllocateContiguousMemory (Windows Drivers)
  - Set the proper MSRs
    - MSR\_IA32\_RTIT\_OUTPUT\_BASE
    - MSR\_IA32\_RTIT\_OUTPUT\_MASK\_PTRS
  - Start the Tracing setting the "TraceEn" flag in the control register
  - Processor logs to in a circular-manner unless interrupt flag configured



- Table of Physical Address (ToPA) Trace Logging
  - For large traces, non-contiguous physical memory must be used
  - ToPA is compatible with Windows Memory Descriptor List (MDL)
  - MDL is a Windows data structure for tracking physical->linear

```
// Grab the physical address:

PHYSICAL_ADDRESS; physAddrate MmGetPhysical Address (1 pBuffVa); perCpuData.u.Simple.lpTraceBuffPhysAddr = (ULONG_PTR) physAddr.QuadPart;

// Allocate the relative MDL

PMDL pPtMdl = IoAllocateMdl(lpBuffVa, (ULONG) perCpuData.qwBuffSize, FALSE, FALSE, NULL);

if (pPtMdl) perCpuData.pTraceMdl = pPtMdl;
```







- Packet Types
  - Packet Stream Boundary (PSB)
    - Heartbeat packet generated at regular intervals (configurable)
  - Paging Information (PIP)
    - Notification of CR3 Page Table changes
  - Timing (TSC, MTC & CYC)
    - Useful for wall-clock comparisons or synchronization of logs across CPU threads
  - Control Flow (TNT, TIP, FUP)
    - TNT Taken/Not-Taken for conditional branches
    - TIP Taken IP address for indirect branches
    - FUP Flow Update





How to use: Linux perf tools (apt: linux-tools-common)

```
$ perf list | grep intel pt
intel pt//
                                    [Kernel PMU event]
$ perf record -e intel pt//u date
Sun Oct 11 11:35:07 EDT 2015
 perf record: Woken up 1 times to write data 1
 perf record: Captured and wrote 0.027 MB perf.data ]
$ perf report
# Samples: 1 of event 'instructions:u'
# Event count (approx.): 157207
# Overhead Command Shared Object Symbol
                 libc-2.21.so [.] _nl_intern_locale_data
  100.00% date
               --- nl intern locale data
                 _nl_load_locale from archive
                 nl find locale
                 setlocale
```



How to use: simple-pt reference driver

```
% sptcmd -c tcall taskset -c 0 ./tcall
     0 offset 1027688, 1003 KB, writing to ptout.0
Wrote sideband to ptout.sideband
% sptdecode --sideband ptout.sideband --pt ptout.0 | less
TIME
         DELTA INSNs OPERATION
frequency 32
         [+0]
                 [+ 1] dl aux init+436
                      6] libc start main+455 -> dl discover osversio
                     13] libc start main+446 -> main
                             main+22 -> f1
                                     f1+9 -> f2
                                     f1+19 \rightarrow f2
                         main+22 -> f1
                                     f1+9 -> f2
                                     f1+19 -> f2
                         main+22 -> f1
```





#### Talos IntelPT driver

• • •



```
B00LEAN bSingleRangeSupport : 1; // [6] - Single-Range Output Supported
B00LEAN bTransportOutputSupport : 1; // [7] - Output to Trace Transport Subsystem

Supported

// (Setting IA32_RTIT_CTL.FabricEn to 1 is

supported)

B00LEAN bIpPcksAreLip : 1; // [8] - IP Payloads are LIP

BYTE numOfAddrRanges; // + 0x01 - Number of Address Ranges

SHORT mtcPeriodBmp; // + 0x02 - Bitmap of supported MTC Period Encodings

SHORT cycThresholdBmp; // + 0x04 - Bitmap of supported Cycle Threshold values

SHORT psbFreqBmp; // + 0x06 - Bitmap of supported Configurable PSB Frequency encoding

};
```



#### <u>Intel</u> Processor Trace

```
// Write the target CR3 value
writemsr(MSR IA32 RTIT CR3 MATCH, targetCr3);
// Start tracing:
rtitCtlDesc.Fields.CR3Filter = 1;
rtitCtlDesc.Fields.FabricEn = 0;
rtitCtlDesc.Fields.0s = 0;
rtitCtlDesc.Fields.User = 1;  // Trace the user mode process
rtitCtlDesc.Fields.ToPA = 0; // We use the single-range output scheme
rtitCtlDesc.Fields.BranchEn = 1;
//if (ptCap.bMtcSupport) {
// rtitCtlDesc.Fields.MTCEn = 1;
// rtitCtlDesc.Fields.MTCFreq = 10;
rtitCtlDesc.Fields.TSCEn = 1;
rtitCtlDesc.Fields.TraceEn = 1;  // Switch the tracing to ON dude :-)
writemsr(MSR IA32 RTIT CTL, rtitCtlDesc.All);
```



```
C:\code\intelpt>instdrv.exe /I windowsptdriver.sys
C:\code\intelpt>testintelpt.exe c:\windows\system32\notepad.exe
C:\code\intelpt>..\libipt\ptdump pt dump.bin | findstr /V pad |
00000000000006e8
00000000000006fe tsc
                            4elef46cbc
0000000000000708 cbr
                            1f
0000000000000070c psbend
00000000000000716 tsc
                            4elef8afb9
0000000000000ce0
0000000000000cf0 tip
                            2: ???????4d515400
000000000000cf5 tnt.8
0000000000000cf8 tip
                            2: ???????4bb10ca0
0000000000000cfd tnt.8
0000000000000cfe tnt.8
0000000000000d00 tip
                            2: ???????4d515400
0000000000000d05 tnt.8
0000000000000d08 tip
                            2: ???????1a91e4f0
000000000000d0d
                 tnt.8
```





# Outro







#### Conclusion

- Evoloutionary algorithms have a lot to offer for automation
  - https://github.com/talos-vulndev/
- Initial investment in development pays dividends
  - Use correct engine for long term deployment
  - Designing tracing engines is not for everyone
- Hardware tracing is approaching software performance
- This code is opensource software
  - https://github.com/talos-vulndev/





# Thank You!













# Talos

#### talosintel.com

blog.talosintel.com @talossecurity

@richinseattle rjohnson@moflow.org







